33 research outputs found

    Malware Classification based on Call Graph Clustering

    Full text link
    Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DBSCAN. Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding Agency for Technology and Innovation as part of its ICT SHOK Future Internet research programme, grant 40212/0

    JGraphT -- A Java library for graph data structures and algorithms

    Full text link
    Mathematical software and graph-theoretical algorithmic packages to efficiently model, analyze and query graphs are crucial in an era where large-scale spatial, societal and economic network data are abundantly available. One such package is JGraphT, a programming library which contains very efficient and generic graph data-structures along with a large collection of state-of-the-art algorithms. The library is written in Java with stability, interoperability and performance in mind. A distinctive feature of this library is the ability to model vertices and edges as arbitrary objects, thereby permitting natural representations of many common networks including transportation, social and biological networks. Besides classic graph algorithms such as shortest-paths and spanning-tree algorithms, the library contains numerous advanced algorithms: graph and subgraph isomorphism; matching and flow problems; approximation algorithms for NP-hard problems such as independent set and TSP; and several more exotic algorithms such as Berge graph detection. Due to its versatility and generic design, JGraphT is currently used in large-scale commercial, non-commercial and academic research projects. In this work we describe in detail the design and underlying structure of the library, and discuss its most important features and algorithms. A computational study is conducted to evaluate the performance of JGraphT versus a number of similar libraries. Experiments on a large number of graphs over a variety of popular algorithms show that JGraphT is highly competitive with other established libraries such as NetworkX or the BGL.Comment: Major Revisio

    Decomposition Approaches for Optimization Problems

    No full text
    This dissertation encompasses the development of decomposition approaches for a variety of both real-world and fundamental optimization problems. Many optimization problems comprise of multiple interconnected subproblems, often rendering them too large or too complicated to solve as a single integral problem. Decomposition approaches are required to deal with these problems efficiently. By decomposing a problem into multiple subproblems, efficient dedicated procedures can be employed to solve the subproblems independently. Furthermore, often strong bounds on the optimal solutions can be derived by exploiting structures in the underlying subproblems.This work primarily focuses on analyzing and identifying problem components to decompose a problem into multiple, easier-to-solve, subproblems. The actual decompositions are obtained through mathematical techniques such as Column Generation and Benders decomposition, thereby relying on Integer Programming, Constraint Programming, heuristic and combinatorial procedures to solve the resulting subproblems. Each solution method is developed with scalability and extendability in mind, while simultaneously making the methods sufficiently robust to account for changes to the original problem definitions. Moreover the decomposition strategies are designed to preserve a notion of optimality, thereby providing insight into the quality of a solution.From an application point of view, the present work is centered around four routing and scheduling problems: the School Bus Routing Problem (SBRP), the Concrete Delivery problem (CDP), the Time-Dependent TSP (TD-TSP) and the Balanced TSP (BTSP). For each of these problems, decomposition strategies have been developed. The SBRP and BTSP are solved via a branch-and-price framework; lower bounds on the SBRP are derived through Lagrangian Relaxation. A Benders decomposition is developed for the CDP. The subproblems resulting from the Benders decomposition are efficiently solved through Integer and Constraint programming, in combination with a fast scheduling heuristic. Finally, a generic, robust Constraint Programming approach, strengthened with Multivariate Decision Diagrams, is implemented for the TD-TSP. To improve domain propagation, bounds derived from alternative problem relaxations are incorporated in the CP search through an additive bounding procedure. To validate the aforementioned solution approaches, experiments are conducted on real-world or simulated data.By decomposing a problem, techniques from various interdisciplinary domains can be combined into an integrated solution approach. Correlations between the problems under consideration as well as the proposed solution methodologies provide insight as to the applicability, limitations and the intuition behind the various techniques. It are exactly these insights that ultimately will lead to fully automated problem solvers, capable of analyzing and decomposing optimization problems without human interference.status: publishe

    Malware Detection Through Call Graphs

    No full text
    Each day, anti-virus companies receive large quantities of potentially harmful executables. Many of the malicious samples among these executables are variations of earlier encountered malware, created by their authors to evade pattern-based detection. Consequently, robust detection approaches are required, capable of recognizing similar samples automatically.In this thesis, malware detection through call graphs is studied. In a call graph, the functions of a binary executable are represented as vertices, and the calls between those functions as edges. By representing malware samples as call graphs, it is possible to derive and detect structural similarities between multiple samples. The latter can be used to implement generic malware detection schemes, which can proactively detect existing versions of the malware, as well as future releases with similar characteristics.To compare call graphs mutually, we compute pairwise graph similarity scores via graphmatchings which minimize an objective function known as the Graph Edit Distance. Finding exact graph matchings is intractable for large call graph instances. Hence we investigate several efficient approximation algorithms. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including variations on k-medoids clustering and DBSCAN clustering algorithms. Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by virus analysts from F-Secure Corporation. Experiments show that it is indeed possible to accurately detect malware families using the DBSCAN clustering algorithm. Based on our results, we anticipate that in the future it is possible to use call graphs to analyse the emergence of new malware families, and ultimately to automate implementinggeneric protection schemes for malware families

    JGraphT Release v1.2.0

    No full text

    JGraphT Release v1.4.0

    No full text

    JGraphT Release v1.4.0

    No full text
    corecore